Model Selection

Multilingual Image Captioning

# Multilingual Image Captioning

Paligemma2 3b Mix 224

PaliGemma 2 is an upgraded vision-language model developed by Google, combining the capabilities of Gemma 2, supporting image and text inputs to generate text outputs, suitable for various vision-language tasks.

An image-to-text model based on the transformers library, capable of converting image content into descriptive text.

Transformers Supports Multiple Languages

Paligemma 3b Ft Science Qa 448

PaliGemma is a 3B-parameter lightweight vision-language model developed by Google, built upon SigLIP vision model and Gemma language model, supporting image and text inputs to generate text outputs.

Paligemma 3b Mix 448

PaliGemma is a versatile lightweight vision-language model (VLM) built upon the SigLIP vision model and Gemma language model, supporting image and text inputs to generate text outputs

Paligemma 3b Ft Docvqa 896

PaliGemma is a lightweight vision-language model developed by Google, built on the SigLIP vision model and the Gemma language model, supporting multilingual image-text understanding and generation.

Paligemma 3b Ft Vqav2 448

PaliGemma is a lightweight vision-language model developed by Google, combining image understanding and text generation capabilities, supporting multilingual tasks.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase